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Extreme value data with a high clump-at-zero occur in many 
domains. Moreover, it might happen that the observed data are ei- 
ther truncated below a given threshold and/or might not be reliable 
enough below that threshold because of the recording devices. These 
situations occur, in particular, with radio audience data measured 
using personal meters that record environmental noise every minute, 
that is then matched to one of the several radio programs. There are 
therefore genuine zeros for respondents not listening to the radio, but 
also zeros corresponding to real listeners for whom the match between 
the recorded noise and the radio program could not be achieved. Since 
radio audiences are important for radio broadcasters in order, for ex- 
ample, to determine advertisement price policies, possibly according 
to the type of audience at different time points, it is essential to be 
able to explain not only the probability of listening to a radio but also 
the average time spent listening to the radio by means of the charac- 
teristics of the listeners. In this paper we propose a generalized linear 
model for zero-inflated truncated Pareto distribution (ZITPo) that 
we use to fit audience radio data. Because it is based on the gener- 
alized Pareto distribution, the ZITPo model has nice properties such 
as model invariance to the choice of the threshold and from which 
a natural residual measure can be derived to assess the model fit to 
the data. From a general formulation of the most popular models for 
zero-inflated data, we derive our model by considering successively 
the truncated case, the generalized Pareto distribution and then the 
inclusion of covariates to explain the nonzero proportion of listeners 
and their average listening time. By means of simulations, we study 
the performance of the maximum likelihood estimator (and derived 
inference) and use the model to fully analyze the audience data of a 
radio station in a certain area of Switzerland. 
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1. Introduction. Audience indicators — like rating, 2 time spent listening 3 
and market share — are essential for radio stations managers and advertis- 
ers. They give important indications on public profiles and on radio stations 
benchmarking, allowing proper radio programming and optimization of ad- 
vertising strategies. The weaknesses of traditional audience measurements 
methods based on individual recollection of the time spent listening to all 
radio stations led to the development of individual, portable and passive 
electronic measurement systems providing more reliable and detailed mea- 
sures [refer to Webster, Phalen and Lichty (2006) for a complete overview 
of audience measurement methods]. Telecontrol 4 thus developed a "wrist- 
watch meter," which records 4 seconds of ambient sound at fix time delays 
and compares these sequences to the corresponding ones of all available ra- 
dios. The "people portable meter" of Arbitron 5 or the "Eurisko multimedia 
monitor" of Gfk 6 consist in a pager-sized device which detects inaudible 
codes that broadcasters embed in their programs. 

Hence, the fundamental audience measure available through these portable 
and passive measurement systems is a dichotomous variable Y% S mt indicating 
if the participant i was listening to the radio station s at the measurement 
m of the day t. Most used audience indicators for a given radio station are 
all functions of the sum of those quantities over a day part, mostly 24 hours, 
that is, Yi s t = ^2\f m Yi sm t. 

We have at our disposal radio audience data of the Swiss measurement 
system "Radiocontrol" in 2007 [refer to Dahler (2006) for a complete presen- 
tation of this measurement system in Switzerland] . As illustrated in Figure 1 , 
the distribution of the daily number of listening minutes yi s t for a given ra- 
dio is extremely skewed, left-truncated and clumped-at-zero. In other words, 
first, the empirical distribution of the data appears monotonically decreas- 
ing. The probability to listen to a radio during a time interval decreases 
with the time interval length. Second, because of contact validation rules of 
the Swiss measurement system, the listening times yi s t are recorded as zeros 
if none of the contacts of the participant i to the radio station s last 3-4 
minutes or more on day t. This means that the smallest observed (recorded 
as such) listening times are 3-4 minutes. This ensures that the probabil- 
ity to observe false positive contact is negligible over a time interval of 4 
or more consecutive minutes. Third, the data contain a high clump-at-zero 
corresponding to the percentage of people that had no recorded contact with 
that radio station. 



2 Percentage of people who tune in to a given radio station during a day. 
3 Average listening time to a given radio station per listener, 
http : //www. telecontrol . ch. 
5 http : //www. arbitron. com. 
6 http : //www. gfk. com. 
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Data with a clump-at-zero and an asymmetric heavy-tail distribution oc- 
cur in numerous disciplines. Examples are the daily levels of precipitation in 
an area [Weglarczyk, Strupczewski and Singh (2005)], the yearly amount of 
car insurance claims per client [Chapados et al. (2002); Christmann (2004)] 
or the length of overnight stays at hospital per patient [Chen, Jiang and 
Mao (2007)]. However, no model has been proposed so far for data with a 
clump-at-zero together with a truncation of small values under a threshold, 
a model that is necessary to describe, in particular, radio audience data like 
in our example, but also any other type of data that might, for example 
for recording reasons, have unreliable measurements at small values of the 
variable of interest. Hence, the purpose of this paper is to develop a model 
able to fit truncated heavy-tailed data with excess zeros and to explain, by 
means of covariates, both the probability associated with a nonzero value 
and the expectation of positive outcomes. Such a model particularly makes 
sense in the context of radio audience: The probability of a nonnull value and 
the expectation of positive outcomes, respectively, correspond to the rating 
and time spent listening audience indicators. Market shares are a function 
of these expectations. 

Models for data with excess zeros have received much attention in the 
literature. The most popular ones include the two-part model of Duan et al. 
(1983) and the zero-inflated count models initiated by Lambert (1992) for 
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Clump-at-zero Distribution of strictly positive listening times (minutes) 

Fig. 1. Empirical distribution of the daily listening times to a national radio in an area 
of the French part of Switzerland during the first semester of 2006. 1382 participants were 
measured by means of the Radiocontrol system during one day of the period of interest. 
Zeros represent 65. 7% of the data. The distribution of the positive data is extremely skewed 
with a maximum daily listening time of 1136 minutes. The lowest possible positive listening 
time is 3 minutes. 
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continuous data, or the hurdle model of Mullahy (1986) for count data. In 
Section 2 we describe our model as a natural extension of these models that 
take into account the left truncation of the outcome variable. To model the 
positive part of the radio listening times, we propose a zeromodal Pareto-like 
distribution. Choice has been made for the generalized Pareto distribution 
because of its ability to fit heavy tails, to be "model invariant" to the choice 
of the threshold for the left truncation, and because it can be used to only 
model the tail of the distribution. The resulting model we propose is hence 
a zero-inflated truncated Pareto (ZITPo) model in which the probability of 
nonzero outcomes and the mean of the positive outcomes are linked to a 
set of covariates in a generalized linear model framework. The ZITPo has 
great fitting flexibility and useful properties as argued in Section 2.5. In 
Section 3 we investigate by means of simulations the sample properties of 
the maximum likelihood estimator and inferential procedures. Since ZITPo 
models are new, it is also important to be able to check the fit of the model 
and, therefore, we propose in Section 4 a new data analysis tool based on 
Pareto residuals that is derived in a natural manner from the properties 
of the ZITPo model. The data from a radio station in a certain area of 
Switzerland are then fully analyzed in Section 5 by means of the ZITPo 
which provides an excellent fit to the data and hence good explanatory 
power for the probability of nonzero outcomes and the mean of the positive 
outcomes. 

2. The ZITPo model. The generalized Pareto distribution, introduced 
by Pickands (1975), is a limit distribution for the excess over a (large) thresh- 
old a for data coming from generalized extreme value distributions, as well 
as a generalization of the Pareto distribution. The three parameter general- 
ized Pareto distribution has the following cumulative distribution function: 



where a, r and £ are location, scale and shape parameters, a > and r > 0. 
The range of y is ]a, — if £ < 0, and ]a, oo[ otherwise. The exponential 
distribution with mean r occurs for £ = 0. Pareto-like distributions occur for 
£ > 0. The generalized Pareto distribution has been widely used to model 
rare events in several fields. Applications for environmental extremes are 
especially numerous (river flow, ozone levels, earthquakes). 

For modeling audience radio data, it is also important to be able to link 
moments or parameters of the generalized Pareto distribution to a set of 
explanatory variables. The generalized linear models (GLM) framework, in- 
troduced by Nelder and Wedderburn (1972), provides a general setting to 



(1) 
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achieve this aim. GLM are a generalization of the linear regression model 
in which the assumption of normality of the conditional distribution of the 
response vector y given a set of covariates X, y|X, is relaxed. These models 
assume that the ith unit response, yi, follows a distribution belonging to the 
exponential family, and the expectation of the ith response, yi, is linked to a 
set of fixed covariates Xj through an invertible linear predictor function is(-), 
by means of Efl^] = i/~ 1 (xj/3), with (3 a set of regression coefficients. The 
generalized Pareto distribution falls outside the exponential family frame- 
work and, hence, the advantages associated with this framework — like well- 
known iterative estimation procedures and mathematical properties — are 
not available. However, extension of the GLM to distributions outside the 
exponential family is pretty straightforward. 

Actually, generalized linear modeling has existed for a long time with 
responses following extreme value distributions, but not in the traditional 
scheme that directly relates the response expectation to the explanatory 
variables through a linear predictor. Indeed, in extreme value analyses, very 
often the parameters of the response distribution instead of the response 
expectation are linked to the covariates. Davison and Smith [(1990), page 
395] consider that this represents "a more fruitful approach" than the usual 
one that links the distribution moments to the regressors, as the moments 
of generalized extreme value distribution do not exist for all values of their 
parameters. We refer to Coles [(2001), Section 6.4] for a review. In survival 
analysis, depending on the choice of the hazard function h(t), the survival 

function f(t) may follow an extreme value distribution. In this context, 

f(t) 

the hazard function h(t) = fz^m 1S then related to the covariates through 
a linear predictor instead of the response expectation. Such developments 
may be found in Aitkin and Clayton (1980). As we will see in more details 
below, for the purpose of modeling radio audience data, it is more sensible 
to link the expected value of the response to a set of covariates. 

Before adapting the generalized Pareto distribution to handle clump-at- 
zero and left truncation of the positive part of the data, as well as incorpo- 
rating in the resulting model covariates in order to explain the probability 
of a zero outcome and the mean of the positive part, we briefly describe 
models proposed so far for data with excess zeros. The aim is to propose a 
general formulation from which different models for different situations can 
be deduced, and, in particular, from which we build our zero-inflated trun- 
cated Pareto (ZITPo) model. We then also describe in details the ZITPo 
model assumptions and discuss some possible extensions. 

2.1. Models for nonnegative data with excess zeros. There is a rich liter- 
ature about adaptation of statistical models to the case of data with excess 
zeros. We refer to Min and Agresti (2002, 2005) and Ridout, Demetrio and 
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Hinde (1998) for a review. Min and Agresti (2002) compare the advantages 
and disadvantages of existing approaches and note that the most appeal- 
ing modeling for continuous data with excess zeros is the two-part model 
of Duan et al. (1983), and the zero-inflated count models initiated by Lam- 
bert (1992) or the hurdle model of Mullahy (1986) in the case of count data 
with a clump-at-zero. These models are similar. Their key idea is to mix 
two random variables: A first one, Y\, that handles the excess of zeros, and 
a second one, Y2, that models the other part of the data. Y\ typically fol- 
lows a Bernoulli distribution where Py 1 (0) = 1 — tt denotes the probability 
to observe a zero outcome. In the hurdle and two-part models (also called 
conditional models), the probability of the data being equal to zero only 
depends on Y\ and the positive data are all modeled by Y2, which may fol- 
low a zero-truncated distribution in the case of count data (hurdle model) 
or a continuous distribution (two-part model). In these cases, Py 2 (0) = 0. 
In zero-inflated models (also called mixture models), Y2 does not follow a 
zero-truncated distribution. The probability associated to zero thus depends 
on both Y\ and Y2 • 

Let Y be a random variable with probability distribution Py for the 
clump-at-zero and the positive part, when the latter is discrete, that is, Y2 
is discrete, then Py may be expressed in the following way: 



where y = 0,1,2, ... , the indicator function «,(•) equals one if the condition is 
true and zero otherwise. Let us refer to a variable as semicontinuous when 
it has a point mass in zero and a continuous distribution for the positive 
values [definition of Min and Agresti (2002), page 7]. Then (2) may easily 
be generalized to continuous or semicontinuous Y2: 



where 5(y) is a Dirac delta function which equals zero for y 7^ 0, Ao(y) is 
a step function taking the value of one for y > and zero otherwise, and 
y £ [0,oo[. Note that when Py 2 (0) = 0, we have the hurdle or two parts 
models, while we have zero-inflated models when this is not the case. 

The use of the generalized Pareto distribution to model zero-inflated data 
is not common, one exception being Weglarczyk, Strupczewski and Singh 
(2005). The authors compare the fitting ability of some semicontinuous dis- 
tributions to fit hydrological data with excess zeros and consider a Dirac 
generalized Pareto distribution with density function 



(2) 



P Y (y) = [P Yl (0) + (1 - P Yl (0))Pr*(vMv = 0) 

+ [(i-JV 1 (o))JV a (y)]t(y>o), 



(3) 



My) = [PyM + (1 - P Yl (o))PY 2 (oMy) 

+ [(l-P yi (0))/y 2 (y)]Ao(y), 
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where r > 0, £ 7^ 0, < (1 — tt) < 1 corresponds to the probability of a 
zero event. Note that compared to (1), a = 0. The Dirac generalized Pareto 
distribution in (4) thus corresponds to a two-part model with Py 2 (0) = 0, in 
which /y 2 (y) is the density function of the generalized Pareto distribution. 

In the following sections we propose to extend (3) [and (4)] to take into 
account the possible truncation of small values, as well as to incorporate 
covariates to explain (a function of) the probability of zero outcomes and 
the mean distribution of positive outcomes. 

2.2. The ZITPo distribution. Let Y* denote the effective (but unknown) 
daily listening time for a given radio. Y* is to the sum over the day of 
the dichotomous variables indicating a contact to that radio station minute 
by minute. The probability and cumulative distribution functions of Y* , 
fY*(y*) an d Fy*(y*), are semicontinuous with a point mass in zero and a 
continuous distribution for the positive values. Let Y denote the observed 
listening times with density function /y(y). As listening times smaller than 
a given value y° (considered as known) are recorded as zeros, observed zeros 
are then a mixture between the effective zero listening times and the pos- 
itive listening times reported as zeros because of the measurement system. 
Accordingly, F Y (0) = F Y * (y°). 

A semicontinuous version of the zero-inflated count model described in (3) 
is indeed adequate to model the double origins of the zeros in the clump-at- 
zero and the positive values of the observed listening times. Let us assume 
that the unknown and true proportion of zero listening times is 1 — tt, with 
< 7r < 1, and that the effective positive listening times follow a two param- 
eter generalized Pareto distribution (with a = 0), Y*\(Y* > 0) ~ GPD(r,£). 
Then, in (3), Py 1 (0) = 1 — tt corresponds to the effective proportion of non- 
listeners, and i-y 2 (0) = i ? (y*|y*>o)(y°) corresponds to the part of the two 
parameter generalized Pareto distribution that cannot be observed because 
of the measurement system limitations. The density functions of the effective 
listening times Y* and of the observed listening times Y are 



[(l-vr)+7rF ( y,|y, >0) (y o )]5(2/) 
+ [Kf(Y*\Y*>o)(y)]A y °(y) 



(5) 



/y*(y*|7r,T,£) 



A (y*) 



/y(y|7r,r,0 



(6) 
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Fig. 2. Empirical distribution function of a data set simulated from a ZITPo model 
with parameters n — 0.5, /i = £ = 0.25 and y° = F^ Y ,, Y , >Q J0.25) . The theoretical trun- 
cated and untruncated density functions are superimposed to the plot with dashed gray 
and black lines. The value of the expectations of the positive values of the truncated and 
untruncated distributions, fj° and are indicated on the x-axis. On the discrete part of 
the plot, the surfaces within the dashed gray and black boxes correspond to the theoretical 
probabilities to observe zeros when there is (dashed gray) and when there is no (black) left 
truncation of the positive part of the data. Those probabilities respectively equal 1 — 7r and 
l-n° = (l-7T) + 7TF- Y \ lY , >0) (y°). 

where < tt < 1, r > 0, £ 7^ and y° > 0. For y° = 0, (6) reduces to the 
Dirac generalized Pareto described in (4). Finally, note that if the observed 
listening times distribution in (6) has the disadvantage of being a mixture 
distribution which makes it more complex to fit, its underlying distribution 
in (5) takes the advantages of the orthogonal parameterization of the hurdle 
and two-part models and is thus easier to interpret [for a discussion on the 
orthogonal parameterization see, e.g., Welsh et al. (1996)]. Indeed, the zeros 
depend on tt, while the positive outcomes rely on the generalized Pareto 
parameters, r and £. 

Figure 2 shows the distribution of a data set simulated from a ZITPo dis- 
tribution. The theoretical untruncated and truncated distribution functions, 
respectively corresponding to (5) and (6), are respectively superimposed to 
the plot in black and dashed gray lines. On the discrete part of the plot, the 
surfaces within the dashed gray and black boxes correspond to the theoreti- 
cal probabilities to observe zeros when there is (dashed gray) and when there 
is no (black) left truncation of the positive part of the data. Those probabil- 
ities respectively equal 1 — tt and 1 — ir° = (1 — tt) + nF^,^ Y , > ^(y°). On the 

continuous part of the plot, the expectations of the truncated (,u ) and un- 
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truncated (/i) distributions are indicated. It is then clear that the expected 
value for the true listening time Y* , //, is different from the expected value 
of the truncated distribution, fj,°. For the audience data, one quantity of 
interest is fi for the untruncated distribution. 

2.3. Covariates modeling in ZITPo distribution. Adaptation of the GLM 
to models for data with excess zeros is very intuitive. The expectations of 
the distributions of Y\ and Y2 in (2) and (3) are linked to the covariates 
through adapted link functions. The logit link is often chosen to relate the 
expectation of Y\, corresponding to the probability to observe positive val- 
ues, to the covariates. The log link makes sense to connect the expectation 
of Y2, corresponding to the mean of the positive data, to the covariates, as 
this last is necessarily positive. For the ith observation, we then have 

(T\ rrr T> (V* ^. (\\ ( R \ 6X P ( X |j ^ 1 ) 

l + exp(x/ 1 /3 1 ) 

(8) W = E[Y*\Y* > 0] = ^(p&fc) = exp(xf 2 /3 2 ), 

where iq~ (•) and ^2* (•) ar e the inverse of the linear predictor functions 
linking the expectations of Y\ and Y2 in (2) and (3) to the covariates, Xji 
and Xj2 are the covariates of the ith observation that may contain the same 
predictors, and /3 1 and (3 2 are the corresponding parameters. Because of the 
orthogonal parameterization of the underlying model in (5), if we use in (7) 
and (8) two different and uncorrelated sets of covariates, Xi and X2, we then 
assume that the processes that explain the probability to observe a positive 
outcome and the expectation of a positive outcome are independent. If part 
of the covariates of Xi are present in (or correlated to) X2, 7Tj and //j will 
possibly be linked. No assumption is done about the form of the relationship 
between these quantities. 

Inclusion of covariates in (4) requires that we express the distribution 
fy(y) i n terms of the expectation of the positive values of the data. Let 
(Y*\Y* > 0) ~ GPD(t,£). Then 

/i = E[y*|y*>0] = :j-^ forl-£>0. 

The first moment of the generalized Pareto distribution, /x, thus exists for 
values of £ lower than one. Substituting r by — £) in (6) gives 
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with 0<7r<l,/i>0,^7^0 and £ < 1, y° > 0. The inclusion of the covariates 
as described in (7) and (8) is now straightforward. For the ith observation, 
we have 



/y,(yi|xii,x i2 ,/3i,/3 2 ,0 
exp(xf 1 /3 1 



(10) 



+ 



1 + exp(x/ 1 /3 1 
exp(xf 1 /3 1 ) 



1 + 



y 



l-CJ exp(x^/3 2 ; 
1 



l + exp(x/ 1 /3 1 )exp(x/ 2 /3 2 )(l-0 



1 + 



l-^exp(xf 2 /3 2 ; 



-l/f-i 



2.4. Assumptions of ZITPo models. The form of the ZITPo model im- 
plies a number of assumptions on the distribution of the positive values: 

First, the unobserved positive listening times belonging to the range ]0, y° [ 
correspond to the nonobserved part of a left-truncated generalized Pareto 
distribution. As the generalized Pareto density function is zero modal and 
monotonically decreasing, this assumption implies that, conditionally on the 
covariates, the probability of positive listening times in the interval ]0,y°[ is 
higher than in any other interval of the same size. As zapping through radio 
is frequent, we believe that this assumption is realistic. 

Second, the expectation pn always corresponds to the quantile 1 — (1 — 
of a GPD(^j,£). Indeed, conditionally on the covariates, as the real 
positive listening times follow generalized Pareto distributions having differ- 
ent expectations Hi but sharing the same £-value, Y^*|(l^* > 0) ~ GPD(//j,£), 
we can observe that 



(11) 



\Y*\Y?>G) 



(jh) = 1- l + £ 



i-(i-0 



Figure 3 shows examples of two parameter generalized Pareto density func- 
tions sharing the same £-value (within the same graph) but having different 
expectations. For the same £-value, the density functions show a great vari- 
ety of forms and thus a high ability to model different data sets with more 
or less heavy tails. 

Third, because of the reparametrization of the generalized Pareto density 
formulated in (9), the shape parameter is restricted to values lower than 
one. This does not seem problematic in regard to (11). Indeed, for £ > 0.95, 
fj, corresponds to quantiles of the distribution higher than 0.95. We do not 
expect cases in which the theoretical mean belongs to the last 5% of the 
distribution at least with radio listening data. 
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Fourth, because of the logit link used in (7), the probability to tune into 
a given radio station conditional on covariates never equals zero or one as 
exp(x^/3 1 ) > 0. We do believe that it is reasonable to state that < ir-i < 1 
in radio audience data: 

• As radio stations broadcast almost everywhere (airports, supermarkets, 
petrol stations,...), it seems reasonable to state that the probability of 
contact of anybody is greater than zero. 

• As radio stations do not broadcast everywhere, it also seems reasonable to 
state that even the biggest fan of a specific radio station can, for example, 
be outside the broadcasting range at some specific times. 

2.5. Properties and extensions of ZITPo models. Even if there are some 
restrictions in the use of ZITPo models, the two-part form of the density 
described in (10) as well as properties of the generalized Pareto distribution 
offer to ZITPo models additional abilities to fit and analyze a variety of data 
sets, in particular, our radio audience data in Switzerland: 

First, one interesting property of ZITPo models is that y° may be chosen 
such that the observed data lower than y° integrate the most part of the false 
zero and false positive observations if the data are not completely reliable in 
the neighborhood of the truncation boundary. If all observed positive data 
inferior to y° are coded as zeros in order to belong to the clump-at-zero in 



Z~GPD(n,( = 0.1) Z ~ GPD(/i, f = 0.5) 




Fig. 3. Examples of two parameter generalized Pareto distributions. In both plots, three 
distribution functions sharing the same £-value are proposed. Their respective expectations 
are /ii = 25, ^12 = 50 and fis — 100. The probability to observe data below the expectation 
is indicated above. 
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(6), the model will estimate the parameters of /y*(y*|7r,T, £) without being 
affected by the errors of the measurement system occurring on [0, y°[. 

Second, the stability with respect to excess over threshold operations of 
the generalized Pareto distribution [see, e.g., Castillo and Hadi (1997), page 
1610, or Coles (2001), page 79] and the shifting property of distribution of 
the location family allow to easily determine the distribution of the data 
over a threshold y* . Let Y* + = (Y*\Y* > 0) denote the positive values of 
the model, with Y* + ~ GPD(rj,£) with Tj = //j(l — £). Then we have that 



-V4-1 



This is of particular interest for radio station managers and advertisers, as 
important radio listeners represent the core of their audience. The distribu- 
tion of the listening time over a threshold thus follows a three parameter 
generalized Pareto distribution of parameters a* = y' , r* = T{ — £y* and 
= £. The corresponding expected listening time over a threshold of y' 
minutes, [/,', is then given by 

(13) £ = E[Y* + \Y* + > y'] = Y ^+y'= f , i + r 5y_+y; 

where \i% = \i in simple models without covariates and \x% = exp(x^/3 2 ) in 
models incorporating covariates. The expectation of the positive data over 
a threshold (i.e., the expectation of the data on ]y*,oo[) thus simply cor- 
responds to a linear shift of the expectation of the positive data on ]0,oo[. 
There is therefore no need to change the ZITPo model when one is interested 
in fj,' or, in other words, the effect of the covariates on fj,' is the same as on 

Third, the ZITPo model can easily be extended to the three parameter 
generalized Pareto distribution by introducing a shift parameter y* < y° 
corresponding to a in (1). In (9) and (10) we have that y* = 0. Adding 
the shift parameter makes sense if information below y* is not of direct 
interest, like if nonlisteners and listeners that only zap through a given radio 
are considered alike for the radio broadcaster. The resulting model which 
extends (9) [and consequently (10)] would allow to model the probability to 
get an outcome lower than a given positive value y* as well as the expectation 
of the data over y', with positive outcomes observed above y° . In this case, 
all data lower than y* would be treated as "zeros" in order to be part of 
the clump-at-zero. The density functions of the observed listening times Y 
would then be (for an observation yi) 



fY{yiWi,n*,C) 



1 - vr' 1 + 



C \{y°-y' xx ~ m ' 



S(Vi) 
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(14) 



+ 



7T; 



1 + 



Vi-y 



The parameters ir' and //" can possibly be linked to a set of covariates as 
is done in (10). If there is no y°-truncation and if the data on }y*,y°[ are 
reliable, y° = y' and (14) is reduced to a two-part model since the first 
part of the right-hand side of (14) reduces to (1 — ir*)S(yi). This extension 
is particularly useful when the interest only lies on the tail distribution 
of the positive outcomes. Indeed, in that case ir* is a nuisance parameter 
and the generalized Pareto distributional assumption on ]0, y'[ is no more 
necessary. For the model to fit the data (observed above y°), one only needs 
the assumption that the generalized Pareto distribution holds above y' , with 
a mean that possibly depends on a set of covariates and constant £. This 
might be an interesting setting, for example, in finance when seeking to 
explain the value-at-risk of financial instruments. In these cases, however, 
the choice of y* might become an important issue and criteria based on 
mean squared errors [see, e.g., Hill (1975); Hall and Welsh (1985); Beirlant, 
Vynckier and Teugels (1996)] or prediction errors [Dupuis and Victoria- Feser 
(2006)] could, in principle, be extended to the ZITPo. In what follows, we 
will, however, focus on models with y' = 0. 



3. Estimation and inference. Fitting methods for the generalized Pareto 
distribution in (1) (i.e., without a clump-at-zero) has been of great interest 
in the literature. Castillo and Hadi (1997) and Singh and Ahmad (2004) 
propose a comparative evaluation of the most used classical estimators for 
the two and three parameter distributions. Robust estimators have also been 
developed [Dupuis and Tsao (1998); Peng and Welsh (2001); Juarez and 
Schucany (2004)]. We propose here to use the maximum likelihood estimator 
(MLE). 

The log-likelihood function of the ZITPo model described in (10) is 



K/3i,/3 2 ,e|y,y°,x 1 ,x 2 ) 
= jX>G/i = o)iog 



i=l 



exp(xf 1 /3 1 ) 
1 + exp(x^/3 x 



(15) 



1 + 



l-CJ \exp(x/ 2 /3. 



-Vi 



+ EVd/) 



xf 2 /3 2 



log 



(i-0- 



l + exp(l+x/ 1/ 3 1 ) 



14 



D.-L. COUTURIER AND M.-R VICTORIA-FESER 




Fig. 4. Distribution of the probabilities of positive outcomes, m, and of the expectations 
of positives values, used to simulate ZITPo realizations. 



+ {|M4-M + te)(=^))}- 

Maximization of this expression is achieved using the quasi-Newton method 
with a numerically computed gradient matrix. Convergence is obtained rapidly 
for most of the cases we have tried, even with models embedding many co- 
variates. The use of slightly different starting values did always provide a 
solution to the unusual cases in which we met convergence problems. The 
program is implemented in R functions available in Couturier and Victoria- 
Feser (2010). 

In order to check the finite sample properties of the MLE for the ZITPo 
model, we perform a simulation study of models incorporating covariates 
as in (10). The MLE is computed on samples with three different sample 
sizes of respectively 500, 1000 and 2000 observations, simulated with two 
different values for the shape parameter, £ = 0.25 and £ = 0.5. The sam- 
pling distribution of the MLE are presented by means of boxplots on 2500 
simulated data sets. Horizontal gray lines indicate the position of the true 
parameter values. The coverage levels of 95%-confidence intervals of the form 
[9 - $ _1 (0.975)ag, 9 + $ -1 (0.975)<7g], where $ is the probability function of 
the standard normal distribution and where &q are obtained from the inverse 
of the estimated Hessian matrix, are also indicated. 
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The data are simulated from a ZITPo distribution with parameters 



For the covariates, the same X matrix is used in both parts of the model. 
The first column of X is a column vector of 1 corresponding to the constant. 
The other columns of X were constructed with random values of respec- 
tively a normal, a Poisson, two binomials and an exponential distribution, 
with corresponding regression parameters /3 X = [1, 1, —0.5, 0.5, 0.25, 0.25] T 
and f3 2 = [2, 1,0.5, 0.5, 0.25, 0.25] T . The values of the /3 X and /3 2 were chosen 
in order to obtain asymmetrical distributions for the probabilities of posi- 
tive outcomes, 7Tj, and for the expectations of positives values, fii, as well 
as a positive relationship between these quantities. Figure 4 shows their re- 
spective distributions as well as the chosen relationship between 7Tj and //j. 
With a median of 0.3, the probabilities of positive outcomes, 7Tj, are rather 
low. The expectations of the positives values, fii, have a very asymmetrical 
distribution. The cutting value y° = 0.125 is a fixed value independent of 
i and which approximately corresponds to the quantile 0.1 of the positive 
simulated data. The form of the dependance between 7Tj and \ii is nonlinear. 
The choice of the parameter values vr^, //^ and y° thus corresponds to an 
extreme choice to test the performance of the MLE in nontrivial situations. 

The bottom plot of Figure 5 shows the sampling distribution of the MLE 
of the shape parameter £. The boxplots of the parameters estimates of £ 
show a small underestimation of the parameter value even when the number 
of positive data is around 650 observations which correspond to 30% of the 
maximum sample size of this analysis. Estimation of the shape parameter 
is known to be problematic even with large sample sizes and regardless of 
the estimating method [Hosking and Wallis (1987)]. Our simulations tend to 
show that the bias of the shape parameter both depends on the number of 
observations n and on the number of covariates p, a situation similar to the 
MLE of the parameter a in multiple regression analyses. This also confirms 
the findings of Chavez-Demoulin and Davison [(2005), page 212] for £ in 
their adaptation of generalized additive models to the generalized Pareto 
distribution. 

The upper and centered plots of Figure 5 present the sampling distribu- 
tions of the MLE of /3 1 and f3 2 - Regardless of the sample size, all boxplots 
are well centered around the true value of the parameters and the coverage 
levels of the corresponding confidence intervals are close to the 95% nominal 
value. As j3 2 and the £ are essentially estimated over the positive part of the 
data which represent the 30% of the 500, 1000 and 2000 observations of our 
study, our results appear very satisfactory. Similar results were obtained in 
simulations with £ = 0.5. 



exp(xf 1 /3 1 ) 



and 




T 
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(3\— estimates 




02 —estimates 



Bee 



03 93 ' 93 8* : 03.91 I 



4- 

Be 



500 1000 2000 



500 1000 2000 



93.76% 94 567c 



500 1000 2000 




500 1000 2000 



500 1000 2000 



500 1000 2000 



£— estimates 



91.87 93.48% 
1000 2000 



Fig. 5. Boxplots of the MLE of (3 X (upper plots), (3 2 (centered plots) and £ (bottom 
plot) computed over 2500 datasets simulated from a ZITPo distribution with parameters 
(3 1 = [1,1, -0.5, 0.5, 0.25, 0.25] T , (3 2 = [2, 1,0.5, 0.5, 0.25, 0.25] T an.i£ = 0.25. y° = 0.125 is 
a fixed value which approximately corresponds to the quantile 0. 1 of the positive simulated 
data. Analyses were performed for samples of sizes 500, 1000 and 2000. The horizontal 
gray lines indicate the position of the true parameter values. The coverage levels of confi- 
dence intervals of the form [0 — $ _1 (0.975)(Tg, + (0.975)crg], where $ is the probability 
function of the standard normal distribution, are also indicated. 
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4. Model validation. Residual analyses in the context of models for data 
with excess zeros as described in (2) and (3) may be split in two parts: A 
first one focusing on the distribution that distinguishes the zeros from the 
positive outcomes, and a second one considering the distribution of the pos- 
itive values. In models with covariates, the residuals of the part distinguish- 
ing the zeros correspond to residuals of logistic regressions. As this topic 
is already well covered in the literature [we refer to Collett (2003) for a 
complete overview] , the following subsections focus on the residuals of the 
positive part of the model. We propose a residual type for truncated and 
untruncated generalized Pareto models. Section 5 presents one use of this 
new residual type. 

Let Y* + = (Y*\Y* > 0) denote the positive values of the model and let 
= (Yi\Yi > y°) be the observed truncated positive values. As (Y* + — 
y°\Y* + >y°) = (Y+ - y°) and follows a GPD(/^ + f^,£), let us define the 
ith residual, £j, in the following way: 

Y + — ii° Y + — ii° 

(16) £i = h(Y+-y°) 



E\Y?-y°] Mi + &/7(l-£) 

The residuals distribution, / £4 (ej), may then easily be derived and is given 
by 



feM)= f<Y+~v°)( h X ( £ i)) 



(17) 



i-ev i-e 

Thus, feX^i) ~ GPD( / u = 1,£). The residuals theoretically (i.e., if the ZITPo 
model holds) follow a generalized Pareto distribution of parameters /i = 1 
and £. This result holds also when y° = 0. Note that this result is a finite 
sample result, a pretty rare situation in GLM. A very powerful finite sample 
model validation procedure thus consists in comparing the distribution of 
the estimated residuals to their estimated theoretical distribution. The for- 
mer are obtained by substituting in (16) the parameters by their estimated 
values, that is, 

Y + - y° 

GPD0u = U). 



Ai + &/7(!-0 

QQ-plots should approximately display a straight line when the model ade- 
quately fits the data. 

Finally, the result in (17) offers a fast method to generate random real- 
izations from truncated or untruncated generalized Pareto models. Indeed, 
let u be a random realization of a Uniform(0, 1) and let \x be the vector of 
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expectations of the generalized Pareto distribution. Then, inverting (16) and 
(17) allows to generate y, a random variate of a y°-truncated GPD(//,£), in 
the following way: 



5. Applications to radio audience data. The ZITPo model is applied to 
the audience data of the local radio station "116" in its broadcasting area 
during the weekdays of the second semester of 2007. The data set is avail- 
able in Couturier and Victoria- Feser (2010). The left upper plot of Figure 7 
presents the distribution of the daily listening times of 2155 participants 
measured during one day of this period. The clump-at-zero represents 63% 
of the data. 

The audience indicators of rating and time spent listening are explained by 
a set of categorical variables including the age in 5 classes ([15,25[, [25,35[, 
[35,45[, [45,60[, [60, 120[), the education level in 3 classes (low, mid, high), 
the gender, the time in month and the different zones of the broadcast 
area. The contrasts used to create the k — 1 dummy variables from a k- 
classes categorical variable are of type "treatment" for the variables age, 
gender and education with base "15 to 25 years old men with low education 
level," and of type "Sum" for the geographical zones and the months. The 
model includes interaction between age and gender. Other interactions — like 
between education and age — appeared nonsignificant and did not improve 
the log-likelihood or the residual distribution. 

To protect the parameter estimates of the possible influence of the false 
positive and false zeros observations belonging to the interval [0,5[, we 
choose y° = 4.95. Consequently, we coded the 19 observations belonging to 
the interval [3, 5[ in Figure 7 as zeros and let the ZITPo model adequately 
separate the true from the false zeros as described in the first part of (10). 

The P l and (3 2 estimated values as well as their standard deviations 
are reported in Table 1. The p- values corresponding to the (asymptotic) 
significance tests for (3 1 and /3 2 , that is, 2<3? _1 (— \(3/a^\), are also indicated. 
According to the chosen contrasts, the estimated intercepts /3io and P20 are 
related to the estimated rating and time spent listening of 15 to 25-year-old 
men with a low education level in the broadcast area of interest during the 
second semester of 2007 through, respectively, 



15-25-year-old men living in the broadcast area of interest and having a low 
education level have thus a probability of contact to radio station "116" of 
12% and an average contact length of about 59 minutes during the second 




exp(/3i ) 



0.12 and exp(/3 20 ) = 59. 



1 + exp(/3i ) 
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Table 1 

(3 1 and f3 2 estimated parameters and corresponding standard deviations of the ZITPo 
model applied to the listening times to radio station "116." The p-values are for 
(asymptotic) significance testing of (3 1 and (3 2 - Low p-values are magnified in the 
columns "Sig." by means of (***), (**), (*), (■) respectively corresponding to significant 
tests at the levels of 0.001, 0.01, 0.05 and 0.1 



Rating Average listening time 





$i 


SE 


p- value 


Sig. 


0i 


SE 


p- value 


Sig. 


(Intercept) 


-1.95 


0.32 


<0.001 




4.08 


0.33 


<0.001 


*** 


[25-35 [ 


0.40 


0.39 


0.309 




-0.16 


0.39 


0.680 




[35-45 [ 


0.94 


0.36 


0.008 


** 


0.20 


0.35 


0.568 




[45-60 [ 


1.57 


0.34 


<0.001 




0.40 


0.34 


0.235 




[60-120 [ 


2.22 


0.35 


<0.001 




0.76 


0.34 


0.026 


* 


Women 


-0.25 


0.49 


0.608 




-0.73 


0.49 


0.133 




Educ. middle 


0.18 


0.16 


0.255 




0.01 


0.13 


0.933 




Educ. high 


0.36 


0.12 


0.002 


** 


-0.15 


0.09 


0.103 




July 


-0.15 


0.12 


0.216 




-0.06 


0.10 


0.516 




August 


-0.11 


0.11 


0.346 




0.04 


0.09 


0.695 




September 


0.14 


0.11 


0.225 




-0.07 


0.09 


0.393 




October 


0.18 


0.11 


0.085 




0.01 


0.08 


0.948 




November 


-0.00 


0.11 


0.973 




0.04 


0.09 


0.654 




Zone 2 


-0.26 


0.05 


<0.001 




-0.08 


0.04 


0.049 


* 


Women + [25-35 [ 


-0.02 


0.58 


0.970 




1.26 


0.57 


0.028 


* 


Women + [35-45 [ 


0.18 


0.54 


0.737 




0.71 


0.53 


0.180 




Women + [45-60 [ 


0.03 


0.52 


0.961 




0.90 


0.51 


0.079 




Women + [60-120 [ 


0.39 


0.52 


0.455 




1.11 


0.50 


0.027 


* 



semester of 2007. The estimated distribution of the effective (untruncated) 
positive times of the individuals of this focus group is thus 

Y*\(Y* > 0) ~ GPD(59,f = 0.082). 

Thus, under the model, F^,, Ym>Q J3\59,i) = 0.05 and i ? ( y 1 *| Y * >0) (y°|59,C) = 
0.09 respectively represent for this focus group the estimation of the part 
of effective positive data that is coded as zero by the Swiss measurement 
system and the estimation of the part of the effective positive data that 
was supposed truncated and coded as zero for the estimation. The average 
ratings and time spent listening of other focus groups — like women with a 
high education level — are then shifts of 12% and 59 minutes. Figure 6, for 
example, presents the estimated (untruncated) listening times distributions 
of men with low eduction level conditional to 5 age classes. The probability to 
tune into this radio station strongly depends on the age class. The expected 
listening times are more or less the same for 15-45-year-old men and increase 
then for the two oldest age classes. 
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Fig. 6. Estimated (untruncated) listening times distributions of men with low eduction 
level conditional to 5 age classes. The probability to tune into this radio station strongly 
depends on the age class. The expected listening times are more or less the same except 
for the oldest age class. 

In order to test the significance of each factor (e.g., age), we use the likeli- 
hood ratio test to compare nested models. Let (3 = [/3^,/3^] T be the vector 
of the regression parameters. The LRT statistic can be used to test hypothe- 
ses of the form Ho : (3j 2 ) = against H\ : 0^) [with /3j£\ unspecified] and 
is given by 

LRT = 2[l(0\y, y°,X 1 ,X 2 ) - I0\y, y°, Xi, X 2 )], 

where (3 and respectively denote the full and reduced regression param- 
eters MLE. The LRT statistic follows a ■ distribution under the null 
hypothesis, where p and p are the number of parameters of the full and 
reduced model. 

Table 2 presents the LRT evaluating which variables significantly influence 
the rating and the average listening times. According to the corresponding 
p-values, the variables significantly influencing the average rating of this 
radio station are the age, the education level and the geographical zone in 
the broadcast area. A look at the fli estimates shows that the rating average 
increases with age and education classes and decreases for people living in 
the countryside area named "Zone 2." The variables significantly influencing 
the average listening time to this radio station are the age, the gender and 
area. The listening time average increases for people belonging to high age 
classes and decreases for people living in "Zone 2." The evolution of listening 
time with age is not the same for men and women. The right upper plot of 
Figure 7 shows the form of the link between the estimated average ratings, 
7Tj, and the average listening times, jli: this relationship is approximately 
linear, strong and positive (the correlation is of 0.78). 
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Table 2 

LRT statistics (with corresponding degrees of freedom) and p-values for the marginal 
LRT applied to the listening times to radio station "116." Each variable (or variable plus 
interaction) of the left column is tested in the binomial (Rating) and truncated GPD 
(Average listening time) part of the model. Low p-values are magnified in the columns 
"Sig." by means of (***), (**), (*), (■) respectively corresponding to significant tests at 
the levels of 0.001, 0.01, 0.05 and 0.1 



Rating Average listening time 





T 


Df 


p- value 


Sig. 


T 


Df 


p- value 


Sig. 


Age + age • gender 


236.58 


8 


<0.001 




78.54 


8 


<0.001 


*** 


Gender + age • gender 


3.26 


5 


0.659 




16.08 


5 


0.007 


** 


Education 


9.61 


2 


0.008 


** 


3.41 


2 


0.182 




Month 


5.91 


5 


0.315 




1.53 


5 


0.909 




Zone 


24.67 


1 


<0.001 




3.92 


1 


0.048 


* 


Age • gender 


2.56 


4 


0.634 




8.14 


4 


0.087 





The estimated shape parameter is £ = 0.082 with <j| = 0.039. The shape 
parameter is thus slightly but significantly higher than zero, meaning that 
a GLM with an exponential error distribution, a special case of the ZITPo 
models when £ — > 0, would not have been convenient in this case. The residu- 
als are to be compared to a GPD(1, 0.082). The analysis of the fit is presented 
in the two bottom plots of Figure 7. The QQ-plots of the residuals and of 
their log show a very good adequacy of the model to the data. 

Such conclusions represent a substantial improvement upon the available 
ratings analyses in which point estimations of audience indicators are calcu- 
lated for the desired focus groups mostly without confidence intervals and 
without the possibility to test the importance of a variable compared to 
others. This information thus allows radio stations to properly adapt their 
programming to better correspond to their desired target audience, and ad- 
vertisers to optimize targeted advertising campaigns. 

6. Conclusion. The ZITPo model is a very powerful model that can be 
used, in particular, to analyze radio audience data. Using the truncated ob- 
servations, this model allows to adequately estimate the true proportions 
of nonzero observations and the average of positive values — corresponding 
to the audience indicators of rating and time spent listening — of the un- 
derlying untruncated listening times distribution. The model also allows to 
relate these expectations to covariates in a GLM spirit, providing an explana- 
tory model to audience data. The model validation procedure resulting from 
properties of the generalized Pareto distribution offers a very helpful way to 
judge the adequacy of the model to the data. 
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Fig. 7. Analyses of listening times to radio "116" in its broadcasting area during the 
second semester of 2007. Left upper plot: Empirical distribution function. The number 
of observations is 2155. The clump-at-zero represents the 63% of the data. Right upper- 
plot: Form of the link between the estimated average ratings, iti, and the average listening 
times, fii. Two bottom plots: QQ-plots of the residuals (left) and of the log of the residuals 
(right) of the ZITPo model applied to those data. The ordered residuals are compared to 
the quantiles (left) and to the log of the quantiles (right) of a GPD(1,£ = 0.082). 



Although the main motivation for the development of the ZITPo model 
was the analysis of radio audience data, we believe that it can adequately 
fit a number of data sets which have heavy tails distributions. For example, 
it provides an extension to model (4) for hydrological data, that can include 
covariates to explain the mean level, with y° = 0. 
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SUPPLEMENTARY MATERIAL 

Radio data set and R Code (DOI: 10.1214/10-AOAS358SUPP; .zip). 
The file "data_ZITPo.csv" contains the data set analyzed in Section 5. 
The observations are in rows and the variables in columns. The file "func- 
tions_ZITPo.r" contains R functions that allow to fit and analyze ZITPo 
models. It produces objects of class "zipto." Usual generic functions are 
then available for objects of that class. The file "script_ZITPo.r" contains 
the R Code used to produce the results of Tables 1 and 2 and the plots of 
Figure 7. 
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